In this walkthrough example, I’ll consider the stillborn variable and investigate the roll of litter size totalborn in the number of still born piglets, controlling for parity and field
parity is the number of times the sow has become pregnant and given birth to a litter,
field is an indicator of which sows were housed in the same field
Generalised Linear Models
Generalised Linear Models
Generalised Linear Models (GLMs) are a powerful extension to the linear regression model, extending the types of data & conditional distributions that can be modelled beyond the normal or Gaussian distribution of linear regression
Binary (dichotomous) variables can be modelled using logistic regression
Count data can be modelled using Poisson regression
Milk yield, animal weight, or other positive, continuous variables can be modelled using a gamma regression
All are special cases of the broad class of GLMs
Also includes linear regression as a special case
Generalised Linear Models
GLMs allow the conditional distribution of the response to be any distribution from the exponential family; Poisson, binomial, Gaussian, gamma, multinomial, …
There are three parts to a GLM
The conditional distribution of \(y\)
The linear predictor\(\eta\), and
The link function
Whilst this affords a wealth of options, often natural choices for the conditional distribution of \(y\) and the link function arise from type of data being modelled
In a GLM we want a model for the expectation of \(y\), \(\mathbb{E}(y_i)\), which here will usually be the mean of the response, \(\mu_i\)
We might model \(\mu_i\) as following a Poisson distribution if the data were counts, or as a binomial distribution in the case of 0/1 data, as we did the in the high SCC example
GLM: Linear predictor
We need to decide which predictor variables and any transformations of them should be used to predict \(y\); this is the linear predictor, \(\eta\)
However, was we saw in the Gaussian linear model, there is nothing stopping the above equation returning values that don’t make sense for the variable we are modelling
GLM: Link Function
So we need to map the values from the response scale on to linear scale, just as we mapped from probabilities to log-odds using the logit transformation. This is the link function, \(g()\)
The link function maps from the response scale to the linear predictor scale
\[g(\mu_i) = \eta_i\]
To map from the linear predictor scale back to the response we need to divide by \(g()\), which means we need the inverse of \(g\), \(g^{-1}\):
\[\mu_i = g^{-1}(\eta_i)\]
The inverse of log is exp, of logit is expit, etc. but R handles this for you
GLM: Link Function
Sounds complicated but think back to the logistic regression model we discussed on Friday
m_still_p |>plot_comparisons(variables =list(parity ="sequential"), # which variable to comparecondition ="totalborn"# which variable to condition on )
Interaction
However, the model only encodes an additive effect of parity: add an interaction effect
m_still_p_int |>plot_comparisons(variables =list(parity ="sequential"), # which variable to comparecondition ="totalborn"# which variable to condition on )
Comparison III — litter size
Main question is in terms of litter size (totalborn)
Replicate data and set litter size = 10 and 15
Predict, compare predictions at both litter sizes, average, test